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Conventionally, multilevel analysis of covariance (ML-ANCOVA) has been the 
recommended approach for analyzing treatment effects in quasi-experimental multilevel 
designs with treatment application at the cluster-level. In this paper, we introduce the 
generalized ML-ANCOVA with linear effect functions that identifies average and conditional 
treatment effects in the presence of treatment-covariate interactions. We show how the 
generalized ML-ANCOVA model can be estimated with multigroup multilevel structural 
equation models that offer considerable advantages compared to traditional ML-ANCOVA. 
The proposed model takes into account measurement error in the covariates, sampling 
error in contextual covariates, treatment-covariate interactions, and stochastic predictors. 
We illustrate the implementation of ML-ANCOVA with an example from educational 
effectiveness research where we estimate average and conditional effects of early 
transition to secondary schooling on reading comprehension. 
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1. INTRODUCTION 

In the social sciences, analysis of covariance (ANCOVA) is one 
of the most important statistical techniques. It is used to analyze 
effects of an independent variable on an outcome variable con- 
trolling for a vector of covariates. In this article, we focus on the 
application of ANCOVA-like techniques in quasi-experimental 
multilevel designs with treatment implementation at the level 
of clusters. Multilevel structural equation models (Rabe-Hesketh 
et al, 2004; Mehta and Neale, 2005; Marsh et al., 2009; Liidtke 
et al., 2011) are used to extend traditional ANCOVA to account 
for the clustered structure in multilevel designs. In order to esti- 
mate causal effects in such designs, it is important to control 
for all relevant covariates, including covariates at the individual- 
level (e.g., student characteristics, intelligence, socio-economic 
status), contextual covariates (e.g., neighborhood-average socio- 
economic status, school-average achievement) and true cluster- 
level covariates (characteristics of the cluster, e.g., school 
resources, location). For extended discussions of causal effects 
in multilevel quasi-experiments, see Gitelman (2005), Hong and 
Raudenbush (2006), Sobel (2006), Hong and Raudenbush (2008), 
VanderWeele (2008), and Nagengast (2009). 

We propose a multigroup multilevel structural equation mod- 
eling approach (MG-ML-SEM approach; see Muthen, 2002) that 
extends conventional multilevel ANCOVA in several ways. The 
particular strengths of the MG-ML-SEM approach are: (1) It 
allows for latent covariates at the individual-level and at the 
cluster-level to control for measurement error; (2) it takes into 
account sampling error in the aggregation of covariates to the 
cluster-level; (3) it naturally includes interactions between the 
cluster-level treatment variable and (latent) covariates at both 



levels and (4) all predictor variables are treated as stochastic 
rather than fixed predictors. The MG-ML-SEM approach is illus- 
trated by an application from educational research in Germany 
(ELEMENT study, Lehmann and Lenkeit, 2008), where we esti- 
mate average and conditional effects of early transition to sec- 
ondary schooling on reading achievement. 

The paper is structured as follows: First, we introduce conven- 
tional multilevel ANCOVA for the analyses of quasi-experimental 
designs with treatment application at the cluster-level. Then, we 
discuss the shortcomings of this conventional approach. Next, we 
illustrate how the MG-ML-SEM approach can be used to over- 
come these shortcomings for estimating average and conditional 
treatment effects and compare several models of the MG-ML- 
SEM approach. In the discussion, we point out the assumptions 
required to estimate causal effects and discuss further research 
directions. 

2. MULTILEVEL DESIGNS WITH TREATMENT APPLICATION 
AT THE CLUSTER-LEVEL 

Multilevel designs with two or more treatment groups are 
typically differentiated along the dimensions of (a) the level 
of treatment application (at the individual- or the cluster- 
level) and (b) the treatment assignment process (randomized 
or non-randomized assignment) (e.g., Plewis and Hurry, 1998; 
Ukoumunne et al., 1999). In the remainder, we will focus on a 
class of multilevel designs that are of particular importance for 
educational research: Designs in which whole clusters are treated 
and there are systematic differences between clusters. Such quasi- 
experimental designs are particularly relevant when randomized 
assignment to conditions is not possible, for example when the 
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effects of differential learning environments or school types are 
studied. 

As in all quasi-experimental designs, observed group differ- 
ences do not reflect unbiased treatment effects in multilevel 
designs with non-random assignment at the cluster-level, because 
confounding variables can bias the estimates of treatment effects. 
Therefore it is crucial to measure and statistically control such 
confounders (e.g., Rosenbaum, 2002; Shadish et al, 2002). In the 
remainder, we call a potential confounder a covariate. 

Conventionally, multilevel ANCOVA has been the recom- 
mended approach for analyzing quasi-experimental multilevel 
designs with treatment application at the cluster-level (e.g., Plewis 
and Hurry, 1998; Seltzer, 2004). Multilevel ANCOVA for a design 
with two conditions includes an indicator for the treatment con- 
dition X and controls for k covariates Z\, , . . ,Z^. A typical 
implementation of an hierarchical two-level ANCOVA, for exam- 
ple given in Raudenbush and Bryk (2002), p. 26, Equation (2.16) 
is as follows: 

Y = Yoo + Yio • X + Y20 • Zi + . . . + y k0 ■ Z k + u+_r . (1) 

e 

If this model is appropriately specified, i.e., if all covariates are 
included and there are no unmodeled non-linear effects or treat- 
ment covariate interactions, yio is an estimate of the average 
treatment effect AVE (e.g., Aiken and West, 1996). In order to 
obtain correct standard errors, the model should be implemented 
as a hierarchical linear model to take the clustered structure of 
the design into account (e.g., Murray, 1998, 2001; Seltzer, 2004), 
i.e., the residual variable £ can be decomposed into a cluster- 
level component w, that reflects the residual variation between 
classrooms and an individual level-component r, that reflects 
variability of individual students around the values predicted by 
the covariates and their cluster. The model can be easily extended 
by including cluster-level covariates W to control for selection 
effects that operate at the cluster-level. 

3. LIMITATIONS OF CONVENTIONAL MULTILEVEL ANCOVA 

The conventional multilevel ANCOVA presented in Equation (1), 
however, has several drawbacks that limit its potential for the 
analyses of quasi-experimental multilevel designs with treatment 
application at the cluster-level. 

3.1. MEASUREMENT ERROR 

Conventional multilevel ANCOVA implemented in the hierarchi- 
cal linear model does not take measurement error into account, 
leading to potentially biased estimates of treatment effects (Cook 
et al, 2009; Culpepper and Aguinis, 2011; Lockwood and 
McCaffrey, 2014). It is well known that unreliably measured 
covariates can yield biased results in regression analysis (Degracie 
and Fuller, 1972; Carroll et al, 2010). This problem is exac- 
erbated in multilevel designs in educational and psychological 
research, where many covariates can only be measured unreliably 
and latent covariates can appear at several levels. The emergence 
of multilevel structural equation models (Rabe-Hesketh et al., 
2004; Mehta and Neale, 2005; Marsh et al., 2009; Ludtke et al, 
2011) allows for controlling measurement error in covariates 



(and outcomes) by including multiple indicators both at the 
individual- and cluster-level. Multilevel structural equation mod- 
els also allow for analyzing the factorial structure of latent vari- 
ables and tests of measurement invariance across levels (Mehta 
and Neale, 2005; Jak et al., 2013) — two assumptions of conven- 
tional multilevel ANCOVA that are not routinely tested (see also 
Schweig,2013). 

3.2. CONTEXTUAL EFFECTS AND SAMPLING ERROR 

In quasi-experimental designs with treatment application at 
the cluster-level, contextual covariates are especially important. 
Contextual covariates (sometimes also referred to as composi- 
tional variables, e.g., Harker and Tymms, 2004; Hutchison, 2007) 
reflect the composition of cluster-level units, for example the 
average achievement level in a classroom or the average socio- 
economic status in a neighborhood. Contextual covariates are 
particularly important covariates because it is very likely that they 
are associated with selection processes at the cluster-level. 

Formally, contextual covariates are conditional expectations of 
a unit-level covariate Z given the cluster variable C. Every unit- 
level covariate Z can be decomposed into a contextual covariate 
or between-cluster component Z\, and a residual or within- cluster 
component Z w : 

Z = E(Z\C)+Z W (2) 
= Z}j + Z w , 

where Z w = Z — E(Z \ C) (Ludtke et al., 2008; Snijders and 
Bosker, 2012). This decomposition shows that a contextual 
covariate Z/, is the regression of the individual-level covariate Z 
on the cluster-variable C. As such, all properties of a regression 
residual hold for the within-cluster component Z w . Notably, its 
expected value is equal to zero and it is regressively indepen- 
dent of the cluster variable C as well as of all functions of the 
cluster-variable such as other covariates at the cluster-level. 

The notion of "contextual effects" — the differential effects 
of the within-cluster and between-cluster components on the 
outcome variable — has long been discussed (Raudenbush and 
Willms, 1995). It is important to take account of such differential 
effects when controlling for cluster-level differences in multilevel 
models and it is critical to the unbiased estimation of treatment 
effects when selection into different treatments occurs at the clus- 
ter level. The conventional multilevel ANCOVA of Equation (1) 
does not make this distinction. The coefficient y^o represents 
the total effect of the covariate Z k that is a blend of the effects 
of Zki, and Z kv/ . The value of /jto depends on the intraclass- 
correlation-coefficient of the covariate Z k (Raudenbush and Bryk, 
2002; Snijders and Bosker, 2012). If only is included as covari- 
ate in an ANCOVA, but selection into treatment conditions varies 
as a function of Zy,, the adjusted effects will be biased (Nagengast, 
2009). 

In applications another difficulty arises: The values of con- 
textual covariates Z/, are often estimated by the manifest sam- 
ple mean in each cluster, assuming perfect reliability of cluster 
means. This assumption is unrealistic if there is only a sample 
of individuals taken from each cluster or if the individual-level 
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ratings measure a cluster-level construct such as classroom cli- 
mate (Marsh et al., 2012). Recently, new approaches have been 
developed that take the unreliability of cluster means into account 
(Croon and van Veldhoven, 2007; Liidtke et al., 2008; Shin and 
Raudenbush, 2010; Grilli and Rampichini, 2011). Marsh et al. 
(2009) and Liidtke et al. (2011) introduced doubly -latent mod- 
els that further extend these approaches by allowing for latent 
aggregation of latent variables measured by multiple indicators. 
They also demonstrated that the contextual effect, i.e., the effect 
of the contextual variable Zj, after controlling for the effects of 
the unit-level covariate Z, will be biased when sampling error is 
not controlled. This bias, that is similar to the bias due to mea- 
surement error, will also affect the estimation of treatment effects 
when a contextual covariate is included in the model. 

3.3. TREATMENT-COVARIATE INTERACTIONS 

The literature on ANCOVA for multilevel designs with treatment 
application at the cluster-level has been surprisingly sparse on the 
issue of including interactions between the treatment and vari- 
ous covariates at the unit- and the cluster-level (see, Plewis and 
Hurry, 1998; Pituch, 2001; Seltzer, 2004, for notable exceptions). 
Such interactions indicate that the effect of the treatment is not 
constant across all units and clusters, but depends on the values 
of individual- and cluster-level covariates. Interactions contain 
important information about the differential effectiveness of the 
treatment for subgroups of units or clusters. In the presence 
of interactions, researchers may consider conditional treatment 
effects (i.e., treatment effects given particular values of the covari- 
ate^) and/or the treatment), or the average treatment effect (i.e., 
the treatment effect obtained by averaging the conditional treat- 
ment effects over the unconditional distribution of covariates). 
In models ignoring important interactions, the regression coef- 
ficient for the treatment effect is an aggregate of the conditional 
treatment effects, but is not equal to the average treatment effect 
(see, Rogosa, 1980, for a detailed explanation in the context of 
single-level models). Using an example with a single covariate, 
Rogosa (1980) showed that the treatment effect obtained from 
traditional ANCOVA (misspecified by ignoring the interaction) is 
an estimate of the (Z = z ca )-conditional treatment effect, where 
z ca denotes the center of accuracy. The center of accuracy is the 
point, where the conditional variance of the effect function is 
minimal and it is not necessarily equal to the average of Z. 

3.4. STOCHASTIC REGRESS0RS 

The conventional hierarchical linear model explicitly assumes 
that the predictors within a sample are fixed quantities that do not 
vary from sample to sample (see e.g., Pinheiro and Bates, 2000; 
Raudenbush and Bryk, 2002; Snijders and Bosker, 2012). Hence, 
all inferences are conditional on the values of the set of observed 
covariates in the sample (Senn et al., 2007). While this assumption 
simplifies the implementation of the statistical estimation proce- 
dure, the case has been made that it is not appropriate for the 
analysis of quasi-experimental designs and observational stud- 
ies (Crager, 1987; Chen, 2006; Nagengast, 2006; Krohne, 2009) 
when unconditional inferences to the true distribution (and not 
to the sample distribution) of covariates are desired. In these 
designs, it is unlikely that the distribution of covariates in a 



sample would be identical in a replication of the study and treat- 
ing the covariates as fixed predictors is not appropriate. Krohne 
(2009) showed analytically and in simulation studies that stan- 
dard errors of average effects obtained from the conventional 
general linear model assuming fixed predictors will be biased in 
the presence of treatment-covariate interactions if the covariate 
is, in fact, a stochastic predictor (see also Sampson, 1974). While 
the problem of stochastic predictors and unconditional inference 
has received some attention in experimental design (Gatsonis and 
Sampson, 1989), correlation analysis (Shieh, 2006) and power 
analysis (Steiger and Fouladi, 1992) for the general linear model, 
the topic has not been studied widely for hierarchical linear 
models. 

4. GENERALIZED ANCOVA 

Before we discuss the MG-ML-SEM framework in more detail, we 
first introduce generalized ANCOVA (Steyer and Partchev, 2008) 
that was developed to overcome some of the issues of conven- 
tional ANCOVA in single-level designs. For a binary treatment 
indicator X and a multivariate covariate Z = (Z\, . . . , Z^), the 
regression of Y on X and Z can always be written as 

E(Y \X,Z)=g 0 (Z)+g l (Z)-X. (3) 

In this representation of the regression, the intercept function 
go(Z) describes the conditional regressive dependency of the 
outcome Y and the covariates in the control group (i.e., for 
X = 0). The values of the effect function g\ (Z) are the conditional 
treatment effects given particular values z of the covariate Z. 

In order to estimate treatment effects, one has to choose a 
parameterization for both go(Z) and g\ (Z). Often, linear param- 
eterizations are chosen for the intercept and the effect functions 
(e.g., Aiken and West, 1996) although other parameterizations, 
e.g., non-linear functions are also possible. Using only a single 
covariate Z and assuming linear functions for go (Z) and g\ (Z) 
yields 

E (Y | X, Z) = (yoo + Yio • Z) + ( Y oi + Yn ■ Z) • X. (4) 

Equation (4) extends conventional ANCOVA by including the 
interaction term yn. If there is an interaction effect, i.e., if 
y n j(z 0, treatment effects are not constant, but vary as a linear 
function of the covariate Z. 

Based on Equation (4), one can also obtain the average effect 
of the treatment (AVE) by taking the expectation of the effect 
function gi(Z): 

AVE = E (yoi +Yn -Z) (5) 
= Yoi + Yn -E(Z). 

Hence, in the presence of interaction effects, the AVE is no longer 
represented by a single parameter as in conventional ANCOVA, 
but is identified by a non-linear function of regression coefficients 
and the expected value of the covariate. If the expected value of 
the covariate £(Z) were included as a model parameter, e.g., in a 
multigroup structural equation model (Krohne, 2009; Nagengast, 
2009), the estimation of the AVE using Equation (5) could 
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take into account the uncertainty associated with the covariate 
means that is introduced when covariates are stochastic predic- 
tors (Sampson, 1974; Gatsonis and Sampson, 1989; Steiger and 
Fouladi, 1992; Chen, 2006; Shieh, 2006). In contrast, the more 
common approach of mean centering the covariate(s) to obtain 
average effects (Aiken and West, 1996) ignores this uncertainty. 
Based on Equation (4), one can also obtain conditional treatment 
effects CTE = gi (z) given particular values z of Z: 



accounts for the fact that the observed between-cluster variances 
and covariances of the predictors are biased estimators of the 
true between-cluster variances and covariances. It corrects the 
effects of the between-cluster covariance matrix for the effects of 
the within-cluster variances and covariances. Throughout this 
article, we will denote the latent aggregation approach with a 
superscript of an asterisk 1 . This decomposition is given by: 



gi(z) = Yoi + Yn -z, 



(6) 



which are also identified by non-linear functions of regression 
coefficients and a value z of Z. While generalized ANCOVA and 
its implementation into multigroup SEM solves some of the prob- 
lems of ANCOVA without interaction, further steps are necessary 
in order to account for contextual covariates and measurement 
error in quasi-experimental multilevel designs. 

5. MG-ML-SEM IMPLEMENTATION OF GENERALIZED 
ML-ANCOVA 

In the remainder of this paper, we introduce the MG-ML-SEM 
implementation of generalized ML-ANCOVA as an alternative 
to the conventional implementation of ML-ANCOVA models in 
hierarchical linear models. The MG-ML-SEM approach naturally 
overcomes the limitations of the conventional approach men- 
tioned above: (1) It controls for measurement error by including 
measurement models for multiple indicators of covariates and 
outcomes. (2) It easily allows the inclusion of contextual effects 
with the appropriate controls for sampling error by the latent 
aggregation approach. (3) It includes treatment-covariate inter- 
actions as a default as it is based on a multiple-group multilevel 
SEM model. (4) All predictors are treated as stochastic rather than 
fixed quantities. In addition, the MG-ML-SEM approach allows 
for group-specific variances of the dependent variable Y given 
covariates Z. 

5.1. MG-ML-SEM 

The implementation of generalized ML-ANCOVA is based on 
the multilevel structural equation model of Muthen (1989, 1994) 
and its extension to multigroup multilevel structural equation 
models (Muthen et al, 1997). Note that it is also possible to 
present and implement the model in the GLLAMM-framework 
(Rabe-Hesketh et al, 2004; Skrondal and Rabe-Hesketh, 2004). 
Rabe-Hesketh et al. (2012) provide a discussion of the advan- 
tages of different frameworks for multilevel structural equation 
modeling. 

The MG-ML-SEM decomposes the vector of manifest vari- 
ables (Y, Z, W b ) into the cluster-level variables (Y£, Z£ , W b ) and 
individual-specific variables (Y^,, Z^,, 0) (see Equation 2). The 
elements of these vectors can be modeled in two ways — either 
using group-mean centering of unit-level variables and the 
corresponding group means as additional predictors (Kreft et al., 
1995; Raudenbush and Bryk, 2002; Enders and Tofighi, 2007) 
or using a full-information latent aggregation approach (Liidtke 
et al., 2008). Similar to the correction of the between-cluster 
variance of the outcome variable in conventional multilevel 
models (e.g., Snijders and Bosker, 2012), the latter approach 




(7) 



where Y is the vector of manifest indicators of latent variables, 
Z is the vector of manifest covariates measured at the unit-level, 
and Wb is the vector of true cluster-level covariates. The vec- 
tor (Y^, Z^, Wb) contains the latent between-cluster components 
of the variables and the vector (Y^, Z^,, 0) contains the latent 
within-cluster components of the corresponding variables on the 
unit-level 2 . 

The MG-ML-SEM consists of (1) the group-specific within- 
cluster measurement model, (2) the group-specific between- 
cluster measurement model, (3) the group-specific within-cluster 
structural model, and (4) the group-specific between-cluster 
structural model: 



* J = A wx i/* + e w within-cluster measurement 
w ' modelforX = x 

Zj^ I = v x + Ay x r)* b + ey between-cluster measurement 
yWb/ model for X = x 

)/* = A x »/* + £ w within-cluster structural 

model for X = x 
rj* = + B x T)* h + £ fc between-cluster structural 
model for X = x 



See Muthen (2004) for details on the implementation of the MG- 
ML-SEM as a sampling model. 

Considering only one individual-specific covariate 
Z = Z*+ Z* , the structural model of the MG-ML-SEM is used 
to estimate the group-specific regressions E X=X (Y | Z b , Z* ): 



E x=x (Y | Z* b ,Zl) = E X = X (Y* b | Z* b ,Z* w ) 

+ E x = x (Y* w \Z* b ,Z* w ) 
= E x = x (Y*\Z* b )+E x = x (Yl\Z* w ) 
= PxO + Pad • Zl + a x i ■ Z* , 



(8) 



1 For the presentation of the MG-ML-SEM and generalized ML-ANCOVA, we 
use notation for the latent aggregation approach. The corresponding equa- 
tions for the manifest aggregation approach are identical except for the lack of 
the asterisk. 

2 True cluster-level covariates like school-funding do not have a within-cluster 
component. 
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where Yf, Z b are (possibly latent) variables in r) b , Y*, Z* are 
(possibly latent) variables in ij* , a x i is a within-cluster regression 
coefficient in A x , p\n is a between-cluster intercept in fixo, and p x i 
is a between-cluster regression coefficient in B x . 

5.2. GENERALIZED ML-ANCOVA 

The generalized ML-ANCOVA combines aspects of conventional 
ML-ANCOVA (by considering the nested structure and within- 
and between-cluster components of variables) and aspects of 
generalized single-level ANCOVA (by considering interactions 
between the treatment variable and covariates). 

The generalized ML-ANCOVA for a single covariate Z = Z b + 
Z* and a dichotomous treatment variable X with values 0 and 1 
is given by: 

E(Y | X,Z*,Z;) =go{Z*,Z* w ) + gl (Z* b ,Z* w ).X, (9) 

Since we want to analyze average and conditional treatment 
effects, our main interest lies in the effect function gi(Zf,Z*,). 
In the multi-group setting, the effect function can be com- 
puted as the difference between the two group-specific regressions 
E x =*(Y\Z* b ,Z*J: 

gl (Z* b ,Z* w ) =E X = 1 (Y\ Z* b ,Z* w ) -E X = °(Y\Z*, Z* w ) . (10) 

Inserting Equation (8) into (10) yields the effect function of gen- 
eralized ML-ANCOVA based on parameters of the MG-ML-SEM: 

gl (Z* b , Z* w ) = E x = 1 (7 | Z*, Z* w ) - E x=0 (Y | Z* b , Z* w ) (11) 
= (Pio - Poo) + (Pn - Poi) • Z* b + (a n - «oi) ■ Z* w 
= Yio + Yn • Z b + Y12 • Z* . 

This effect function can be used to compute average and condi- 
tional effects. In order to compute the average treatment effect, we 
first need to compute the unconditional expectations of covari- 
ates. The MG-ML-SEM only contains parameters for the condi- 
tional expectations of covariates given treatment group x, but the 
unconditional expectation can be computed by: 

E(Z) = E [E(Z | Q] = E(Z b ) = E [E(Z | X)] 

= M-oi • P (X = 0) + |Xn • P (X = 1) , (12) 

where |Xoi, [in are between-cluster "intercepts" of exogenous 
covariates in p x o, i.e., (Xoi> M-n are the group-specific true means 
(Xoi = E(Z | X = 0) and mi = E(Z \ X = 1). Then, the average 
effect is AVE = E[gi (Z*, Z* )] = y 10 + y n ■ E(Z). 

Generalized ML-ANCOVA can easily be extended to j + 1 
treatment groups and to include more covariates at the within- 
and at the between-cluster level, as well as pure cluster level 
covariates: 

E (Y | X, Z* b ,Z* w , W) = g 0 {Z* b ,Z* w , W) + gl (Z* h , Z* w , W) ■ 
Ix=i + -..+ g j(Z* b ,Z* w , W)-I X=J , 

(13) 



where Ix=j is an indicator for treatment condition;', and Z b , Z* , 
W are multivariate random variables. Based on Equation (13), 
we can consider average treatment effects E[ g j(Z b , Z* , W)] and 
conditional treatment effects £[g/(Z* ,Z*,W)\ /(Z* ,Z*,W, X)] 
given any function f(Z^, Z* , W, X). 

Next, we illustrate the MG-ML-SEM approach to general- 
ized ML-ANCOVA with a model from educational effectiveness 
research. Using a single dataset, we present the increasingly 
complex MG-ML-SEM models that show the features and the 
versatility of the approach. 

6. ILLUSTRATIVE EXAMPLE 

6.1. PARTICIPANTS AND PROCEDURE 

To illustrate the MG-ML-SEM, we use data from the ELEMENT 
study in Berlin, Germany (Lehmann and Lenkeit, 2008). 
ELEMENT is a three wave longitudinal study aimed at examining 
effects of early transition to secondary school (after 4th grade) on 
students' reading and mathematics proficiency. From 2002/2003 
to 2004/2005, a total of N = 4926 students were measured in 
4th grade, 5th grade, and 6th grade with several ability tests. 
N = 3169 students attended elementary school until the end of 
6th grade, whereas N = 1757 decided to make the transition to 
secondary school after 4th grade. 

In this article, we do not present a comprehensive analysis of 
the ELEMENT study. Instead, the primary goal of our paper is 
to illustrate the MG-ML-SEM approach as a means to estimate 
conditional and average effects in educational research. For didac- 
tic purposes, we restrict ourselves to reading comprehension as 
outcome, to a limited set of covariates (prior reading achieve- 
ment and interest in reading), to two occasions of measurement 
(4th grade and 6th grade), and to only one of the five data 
sets with imputed missing values provided by the Research Data 
Centre at the Institute for Educational Quality Improvement. 
A complete causal analysis would most likely require the inclu- 
sion of more covariates and/or propensity scores (see causal 
inference section in the discussion for details), and careful con- 
sideration of the sampling design and the missing data struc- 
ture (e.g., Baumert et al., 2009; Lehmann, 2010; Becker et al., 
2014). 

6.2. MEASURES 

6.2. 1. Reading comprehension 

The reading test used in the ELEMENT study was based on the 
theoretical framework of the IEA and OECD reading assessments. 
It contained items from PIRLS (Mullis et al., 2006) and from the 
LAU Study (Lehmann et al., 1997, 1999). Students' reading abil- 
ities were obtained as weighted likelihood estimates based on an 
IRT model. In order to obtain comparable scores across time, an 
anchor items design was used. In our analyses, we used test scores 
from 4th grade as covariate Z in order to control for pre-existing 
differences in reading, and we used test scores from 6th grade as 
outcome variable Y. 

6.2.2. Interest in reading 

The scale "interest in reading" consisted of five items. For our 
analyses including a latent covariate rj, we used the following three 
positively worded items as indicators of a common latent variable: 
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Vi (ItemAsf0902): 
V 2 (Item Asf0903): 

V 3 (ItemAsf0905): 



I like talking to others about books. 
I am pleased with a book received as a 
gift. 

I like reading. 



The response format was a four-point Likert scale with cate- 
gories "1 = strongly agree," "2 = agree," "3 = disagree," and 
"4 = strongly disagree." The three items were recoded so that 
higher values represent higher interest in reading. For the anal- 
yses including only manifest variables, we computed a scale V for 
"interest in reading" by taking the mean of the three recoded items 
for each student. 

6.2.3. Other variables 

Our treatment variable X is type of school at 6th grade with val- 
ues X = 0 (elementary school) and X = 1 (secondary school). 
We used class ID in 6th grade as cluster variable C. For some 
of the analyses ignoring latent aggregation, we created between- 
cluster components of all covariates by computing the empirical 
class means. We also created the within-cluster components by 
computing the difference between the corresponding variable and 
their empirical class means. This decomposition was done for 
reading comprehension at 4th grade Z, each of the three indica- 
tors of "interest in reading" Vi, V%, V3, and the scale for "interest 
in reading" V = ( V 1 + V 2 + V 3 )/3. 

7. MODELS 

We analyzed the data with six multigroup multilevel models in 
order to illustrate the different effect estimates and highlight the 
strengths of the MG-ML-SEM approach. In the simplest model 
Ml, we did not control for any covariates; in M2, we con- 
trolled for manifest covariates without distinguishing within and 
between components (as in a traditional multilevel ANCOVA 
model); in M3, we controlled for within and between components 
of manifest covariates using manifest aggregation; in M4, we con- 
trolled for within and between components of manifest covariates 
using latent aggregation; in M5, we controlled for within and 
between components of a latent and a manifest covariate using 
manifest aggregation; and finally in M6, we specified the full 
doubly latent model controlling for within and between compo- 
nents of a latent and a manifest covariate using latent aggregation. 
See Table 1 for an overview of models and their characteristics. 
We chose to present all six models in order to show differences 
in point estimates and standard errors between models in the 
context of our illustrative example. These analyses show con- 
siderable differences between models and demonstrate the need 



Table 1 | Overview of six multigroup multilevel structural equation 
models and their characteristics. 





M1 


M2 


M3 


M4 


M5 


M6 


Covariates 




S 










Contextual covariates 














Latent covariates 














Latent aggregation 















for a careful examination of treatment effects while meeting the 
requirements of complex multilevel designs. 

7.1. Ml: MG-ML-SEM WITHOUT COVARIATES 

The first MG-ML-SEM is a way of estimating the (unadjusted) 
means of reading comprehension from 6th grade Y in elementary 
school (X = 0) and secondary school (X = 1). The parameters in 
this model are the group-specific within and between variances 
of Y, and the group-specific means of Y. The within and between 
structural model for both treatment groups (X = 0 and X = 1 ) 
are given by: 

Y* = 0 + f w within structural model X = 0 

Y* = 0 + £ w within structural model X = 1 

Yy = Poo + Kb between structural model X = 0 

Yy = P10 + between structural model X = 1 

where Y* is the within-cluster component of Y and Yy is the 
between-cluster component of Y. The asterisk indicates that Y* 
and Yy are latent variables obtained by latent aggregation. In this 
model without covariates, the structural intercepts Poo and P10 are 
the group-specific means of Y, f w is a residual at the within level, 
and Kb is a residual at the between level. 

Based on the parameters of this model, we can compute the 
effect function (see Equation 11), which is identical with the 
average treatment effect in this model, because there are no 
covariates: 

AVE = p 10 - Poo 

Obviously, in the ELEMENT study as in other observational stud- 
ies, the AVE obtained from Ml does not reflect an unbiased 
estimate of the average causal effect of school type on reading 
comprehension from 6th grade. It is only in randomized con- 
trolled trials, that we can estimate the causal effect without bias 
using a model such as Ml, i.e., without controlling for covariates. 

7.2. M2: MG-ML-SEM MANIFEST COVARIATES/WITHOUT 
CONTEXTUAL COVARIATES 

In the second MG-ML-SEM, we implemented the conventional 
multilevel ANCOVA by adding two covariates to the model, 
namely reading comprehension from 4th grade Z and the scale 
score of interest in reading V, and consider the group-specific 
regressions E X=X (Y | Z, V) in the following MG-ML-SEM: 



0 + Otoi • Z w + 0(02 • V w + Kw 

within structural model X = 0 

0 + otn ■Z w + a n -V w + Kw 

within structural model X = 1 

Poo 
IJ-01 

H-02 

between structural model X 





0 
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{V\ /Pio\ /PlA /Pl2\ 

between structural model X = 1 

In this model we demonstrate the consequences of disregarding 
the decomposition of (V, Z) into within and between com- 
ponents. This decomposition was ignored by constraining the 
corresponding regression coefficients to be equal across levels, i.e., 

0(01 = Pol, a 02 = Po2, ail = Pll, 0(12 = Pl2- 

Based on the parameters of this model, we can compute the 
effect function g\ (Z, V) and the average effect AVE as follows (see 
Equations 11, 12 for details): 

gx (Z, V) = E X=1 {Y | Z, V) ~E X = 0 (Y | Z, V) 

= (Pio - Poo) + (Pn - Poi) • Z + (Pu - p 02 ) ■ V 
= Y10 + Y11 ■ Z + Y U -V 
AVE = (B 10 - poo) + (P11 - Poi) • E(Z) + (p 12 - p 02 ) • B(V). 

The average treatment effect of school type on reading compre- 
hension from 6th grade obtained from this model can be causally 
interpreted only under strong assumptions (see causal inference 
section in the discussion). If there are other confounders not 
included in the model or if there are contextual effects, the AVE 
from M2 must not be causally interpreted. 

7.3. M3: MG-ML-SEM MANIFEST COVARIATES/MANIFEST 
AGGREGATION 

In the third MG-ML-SEM, we add contextual covariates to 
the model, i.e., we decompose the manifest covariates (Z, V) 
into between-cluster components (Zfe, V;,) and within-cluster 
components (Z w , V w ). For this decomposition, we follow 
the group-mean centering approach assuming perfect relia- 
bility of manifest cluster means of covariates. In the ter- 
minology of Marsh et al. (2009), M3 is called a "Doubly- 
Manifest Model." We computed the values of the between- 
cluster variables (Z/,, Vj,) and the within-cluster variables 
(Z w , V w ) before fitting the model. We consider the group- 
specific regressions E X = X (Y \ Zb, V b , Z w , V w ) in the following 
MG-ML-SEM: 

Y* = 0 + 0(01 • Z w + a 02 ■ V w + £ w 
within structural model X = 0 



/n\ /pio\ /Pn\ /Pn\ 

between structural model X = 1 

Again, based on the parameters of M3, we can compute the effect 
function gi(Z b , Vb,Z w , V w ) and the average treatment effect as 
follows (see Equations 11, 12): 

g\{Zb, Vb, Z w , V w ) = (Pio - Poo) + (P11 - Poi) • Zf, 

+ (P12 - P02) • Vfc + (an - otoi ) • Z w 
+ (aw - a 02 ) • V w 
= Y10 + Y11 • Zb + Y12 • V b + yi3 • Z w 
+ Yi4 • V w 
AVE = (p 10 - Poo) + (P11 - Poi) • E(Z) 
+ (P12 - P02) • E(V) 

The treatment effects in this model depend on the values of the 
within-cluster components and the between-cluster components 
of covariates. Note that the equation for computing the AVE sim- 
plifies considerably, because the expectation of the within-cluster 
components of the covariates are zero, i.e., E{Z W ) = E(V W ) = 0 
(see section 3.2). The unconditional expectations of covariates 
are computed as shown in Equation (12). M3 is less restrictive 
compared to M2. It requires causal unbiasedness of the regres- 
sion E(Y I X, Zb, Vb, Z w , V w ), which means that there must not 
be any omitted confounders. 

7.4. M4: MG-ML-SEM MANIFEST COVARIATES/LATENT AGGREGATION 

Our fourth MG-ML-SEM is very similar to M3, with the only 
exception that contextual covariates are treated as latent vari- 
ables using the full-information latent aggregation approach 
(Liidtke et al., 2008) as indicated by the superscript of an asterisk 
(Z* , Z^,V*,V^) in the model equations for the fourth MG-ML- 
SEM. This model was first presented in Nagengast (2009): 

Y* = 0 + aoi • Z; + a 02 • V* + ? w 

within structural model X = 0 

Yl = 0 + an • 2* + a 12 • V* w + f w 

within structural model X = 1 

Y* b \ /Poo\ /PoA /P02\ (Slb\ 

Zt \ = L, + 0 Zf + 0 V? + 




0 + o(n • Z w + an • V w + f w 
within structural model X = 1 

P00\ /PoA /P 02 \ /?lA 

ixoi + 0 \z b + 0 \ V b + hb 

V 0 / \ 0 / \i; 3b j 

between structural model X = 0 
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yv;/ VnW V 0 / \ 0 / 

between structural model X = 0 

'n\ /pio\ /Pn\ /PiA 

between structural model X = 1 
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Marsh et al. (2009) termed this model a "Manifest- Latent Model." 
The computations of the gi(Z%, V^,Z*, V*) and the average 
treatment effect mimic the corresponding computations shown 
for M3 (see Equations 11, 12): 



gl (z* b ,v;,z* w ,v*j 



AVE 



(Pio — Poo) + (Pn — Poi) • Z\ 

+ (Pi2-Po2)-V£ + (a„-ccoi)-Z; 

+ (0tl2 -O02) • Vt 

Y10 + Y11 • Z* b + Y12 • Vt + Yi3 • K 
+ Y14- K 

(Pio-Poo) + (Pn-Poi)-£(2) 

+ (Pl2-P02)'£(V) 



7.5. M5: MG-ML-SEM LATENT COVARIATES/MANIFEST AGGREGATION 

In our fifth MG-ML-SEM, we take one step back and again apply 
manifest aggregation for covariates. Unlike previous models, we 
include a measurement model for the latent covariate "interest 
in reading," i.e., the three positively worded items Vi, V2, V3 are 
indicators of a latent construct r\. The three indicators are decom- 
posed into their within- and between-cluster components using 
manifest aggregation. In the terminology of Marsh et al. (2009), 
M5 is called a "Latent-Manifest Model." Particular advantages 
of explicitly including a latent covariate are: (1) one can appro- 
priately account for measurement error in the covariate, (2) the 
option to test measurement invariance across groups and across 
levels, and (3) the availability of fit indices to examine model fit. 
Adding a latent variable in the MG-ML-SEM approach requires 
the specification of a within-cluster measurement model and a 
between-cluster measurement model for both treatment groups. 
We assume measurement invariance across levels (Mehta and 
Neale, 2005; lak et al, 2013) and across groups (Meredith, 1993). 
In the model equations, we show the common within-cluster and 
the common between-cluster measurement models: 



'V lw \ 
V 2w 




<Y*b) 



z b 



within measurement model 
/OX /IX (e lb \ 
I v 2 1 + I x 2 1 m + I s 2 6 1 

\V3/ VW VW 
between measurement model 
0 + a 0 i • Z w + a 0 2 • r) w + £ w 
within structural model X = 0 
0 + an • Z w + an • r) w + f w 
within structural model X = 1 

/Poo N 

\lL02y 

between structural model X = 0 





Z b + 0 \ T) h + 



between structural model X = 1 



Average and conditional effects are computed as in the previous 
models. Note that we use the unconditional expectation of the 
latent covariate r) to compute the average treatment effect. E{rj) 
will depend on the chosen scale: In a r -congeneric measurement 
model as used in this example, r\ is uniquely defined up to positive 
linear transformations. We fixed the scale of r/ by fixing the first 
measurement intercept to Vi = 0 and the first loading to \\ = 1 
in both treatment groups 3 . Given a group-invariant measurement 
model, E{ij) can be computed: 

E(ri) = M.02 • P(X = 0) + 1112 • P(X = 1). 

The effect function and the AVE from M5 are then (see 
Equations 11, 12): 

gi(Z b , r) b ,Z w , rj w ) = (P10 - Poo) + (P11 - Poi) • Z h 

+ (P12 - P02) • m + (an - a 0 i) • Z w 
+ {an - a o2) • 1w 
= Y10 + Y11 • Zb + Y12 • m + Y13 • Z w 
+ Yi4 • 1 W 
AVE = (p 10 - Poo) + (P11 - Poi) • E(Z) 
+ (P12 - P02) • E{n) 

7.6. M6: MG-ML-SEM LATENT COVARIATES/LATENT AGGREGATION 

The sixth and final model is the full doubly latent MG-ML-SEM 
including latent covariates and latent aggregation. The only dif- 
ference compared to M5 is that the aggregation of pre-treatment 
reading comprehension Z and the three indicators of interest in 
reading is latent as indicated by the superscript of an asterisk. M6 
is specified by the following equations: 




within measurement model 

3 In principle, other ways of fixing the scale of r\, such as fixing the latent mean 
and variance, are possible as well. However, we recommend this approach in 
the context of multigroup multilevel SEM for analyzing average and condi- 
tional effects. Using the latent variance for setting the scale would complicate 
specification, because the variance needs to be specified appropriately in the 
multigroup-multilevel setting, which requires constraints across groups and 
across levels. Also, we do not want to fix the mean of latent variables in 
each of the groups, because this would lead to biased estimates of treatment 
effects. 
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lb 



2 b 



vV"; 



lb' 




zt = 



between measurement model 
0 + otoi • Z* + a 0 2 -n* w + ^w 
within structural model X = 0 
0 + an • Z* + an ■ »?* + f w 
within structural model X = 1 

(Poo 
Hoi 
1^02 

between structural model X 
/Pio 

\M<12 

between structural model _X = 1 




z,: + 




0 




Figure 1 illustrates M6. The upper path diagram of Figure 1 refers 
to treatment group X = 0 and the lower path diagram refers 
to X = 1. Each of the group-specific path diagrams is divided 
into three parts: the gray-shaded middle part shows the observed 
variables that are decomposed into contextual variables (upper 
part L2) and unit-level residuals (lower part LI). Based on the 
parameters of the doubly latent model shown in the model equa- 
tions above and displayed in Figure 1, we can compute the effect 
function and its expectation by (see Equations 11, 12): 



gi(Z* b , rf h ,Z* w , rf w ) 



ME : 



(Pio — Poo) + (Pn — Poi) • Zj* 

+ (P12 - P02) • it + (an - 001) • Z* 

+ (a i2 - a 02 ) ■ l* w 

Y10 + Y11 • Z* b + Y12 • rf b + Y13 • Z* 

+ Yi4 • rf w 

(Pro - Poo) + (P11 - Poi) ■ E(Z) 
+ (Pi2-Po 2 ) -EM 



Mplus syntax for M6 is given in the Supplementary Material. 
8. RESULTS 

The aim of our illustrative example is twofold: First, we demon- 
strate the consequences of estimating average effects with dif- 
ferent models. Second, we show how to use the MG-ML-SEM 
approach to estimate conditional effects and average effects of the 
treated and the untreated. We begin our presentation with the 
descriptive statistics of the central variables in our analyses. 




FIGURE 1 | Path diagram for the full doubly latent multigroup 
multilevel structural equation model. 



grade (the control group X = 0), whereas the right-hand columns 
depicts descriptive statistics for those students that made the early 
transition to secondary school after 4th grade (the treatment 
group, X = 1). Descriptively, the distributions of pre-treatment 
covariates are quite different between the two groups, which 
is expected in quasi-experimental designs like the ELEMENT 
study. 



8.1. DESCRIPTIVE STATISTICS 

Table 2 shows means, standard deviations, and intra-class corre- 
lation coefficients for all manifest variables used in our analyses. 
The left-hand columns depicts descriptive statistics for those 
students that stayed in elementary school until the end of 6th 



8.2. COMPARISON OF MODELS 

The first aim of our illustrative example was to examine whether 
model choice is important for assessing the effects of early transi- 
tion to secondary school on reading comprehension. To examine 
the differences between the models, we analyzed the data from 
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Table 2 | Means, standard deviations, and intra-class correlation 
coefficients for all manifest variables used in the analyses. 



Elementary school Secondary school 



Variable 


M 


SD 


ICC 


M 


SD 


ICC 


Y 


109.60 


12.71 


0.24 


123.40 


10.61 


0.10 


Z 


97.33 


15.12 


0.21 


113.84 


11.27 


0.09 


V 


2.98 


0.78 


0.04 


3.37 


0.58 


0.04 


v, 


2.31 


1.07 


0.02 


2.72 


0.97 


0.02 


v 2 


3.37 


0.90 


0.02 


3.65 


0.64 


0.02 


v 3 


3.27 


0.99 


0.03 


3.71 


0.63 


0.02 



Y: Reading comprehension 6th grade; Z: Reading comprehension 4th grade; V: 
Manifest scale interest in reading; V-\, V2, V3: Three positively worded items of 
scale interest in reading. 



the ELEMENT study with the six MG-ML-SEMs presented in the 
last section. All models were implemented using Mplus 7 (Muthen 
and Muthen, 2012). Table 3 shows the average treatment effect 
together with standard errors and effect sizes, the parameters of 
the conditional effect functions gi(Z), and group-specific contex- 
tual effects of covariates for all models. Effect sizes (ES) for the 
average effects were computed by dividing AVE by the standard 
deviation of Y given X = 0. The substantive conclusions drawn 
from the models are quite different, as will be discussed in detail 
in the following paragraphs. 

There was a strong and significant average treatment effect 
obtained from Ml (AVE = 13.59, 95%CT[12.10, 15.08], ES = 
1.07). In a randomized experiment, an applied researcher might 
conclude that, on average, early transition to secondary school 
was beneficial for students' reading comprehension. However, 
in the present quasi-experimental setting, the observed differ- 
ence in outcome means between treatment groups may be due 
to pre-existing differences. In fact, as shown in Table 2, those stu- 
dents who chose to make the early transition to secondary school 
already have higher values on reading comprehension in 4th grade 
and higher interest in reading on average. Therefore, we need 
to control for these pre-existing differences by conditioning on 
relevant covariates. 

In M2, we controlled for the manifest covariates "reading com- 
prehension in 4th grade" Z and "interest in reading" V without 
distinguishing within and between components of these covari- 
ates. The average treatment effect obtained from M2 (AVE = 
5.23, 95% CI [3.90, 6.47], ES = 0.43) was considerably lower 
compared to the AVE from Ml but is still significant. If there 
were no contextual effects and no other covariates introducing 
bias, one would conclude that early transition to secondary school 
positively affects students' reading comprehension on average. 

M3 is the first model in our sequence of models that separated 
the within-cluster and between-cluster components of covari- 
ates and estimates contextual effects. In our illustrative example, 
there was a significant contextual effect for reading comprehen- 
sion at 4th grade for those students that stayed in primary schools 
(CE ZY -x=o = 0.27, 95% CX[0.19, 0.35]), but no significant con- 
textual effect of reading comprehension at 4th grade for students 
that moved to secondary schools and no contextual effects for 



interest in reading in both groups (see Table 3). Substantively, the 
contextual effect showed that students with equal initial achieve- 
ment levels and equal interest in reading (i.e., given Z = z and 
V = v) profited from being schooled together with high achiev- 
ing students (i.e., high values on Z\, and V\,) in primary school, 
but not if they made an early transition to secondary school. In 
M3, where we took account of contextual covariates in the com- 
putation of the average treatment effect, the AVE was no longer 
significant (AVE = 1.76, 95% CI [-0.20, 3.72], ES = 0.14). 

All subsequent models (M4-M6) revealed a similar pattern of 
results: The AVE is not significant and there was a contextual 
effect of initial reading comprehension for students that stayed 
in primary school. The estimates from models M3-M6 ranged 
from AVE = 1.76 (95% CI [—0.20, 3.72], ES = 0.14; M3) to 0.14 
(95% CI [-2.57, 2.85], ES = 0.01; M4). Descriptively, the mod- 
els using latent aggregation of covariates (M4 and M6) show the 
lowest AVE. When the effect estimates were standardized using 
the standard deviation of the outcome in primary school, the 
results were very similar (see row ES in Table 3). With regard to 
standard errors, we find that the models including latent aggre- 
gation (M4 and M6) had the highest standard errors, followed 
by the models using manifest aggregation and contextual covari- 
ates (M3 and M5), and the model ignoring contextual covariates 
(M2) had the lowest standard error. M6 had the highest standard 
error, because it accounts for uncertainty in the aggregation of 
covariates and also includes latent covariates. 

In summary, the average total effects obtained from models 
M3-M6 were pretty similar, but they were substantially different 
from the apparent effects obtained from Ml and M2. The com- 
parison of models clearly shows that it is necessary to control for 
contextual covariates, if there are contextual effects. 

8.3. DOUBLY-LATENT MODEL M6 

In order to further illustrate the advantages of the MG-ML-SEM 
approach, we present a more detailed analysis of conditional and 
average effects based on the doubly-latent model M6. Although 
there were only minor differences between M3 and M6 in our 
illustrative example, we generally prefer M6, because it overcomes 
all of the limitations of conventional ANCOVA mentioned in the 
introduction. We note, however, that there might be situations 
in which partial correction models may be more appropriate for 
substantive or statistical reasons (Ltidtke et al, 2008, 2011; Marsh 
etal, 2012). 

Detailed results for all parameters of the doubly-latent model 
M6 and model fit information are given in Table 4. The 
effect function g\(Z), the average treatment effect AVE, con- 
textual effects, and all conditional effects considered in sub- 
sequent paragraphs, were based on these parameters. Figure 1 
shows a path diagram of the full model. Next, we demon- 
strate how the MG-ML-SEM approach could be used to 
answer substantive research questions using our preferred 
model (M6) as case example. In particular, we show how the 
MG-ML-SEM approach can be used to examine average effects 
as well as conditional treatment effects given certain values of 
covariates. 

In our illustrative example, the average effect is not the only 
interesting quantity from a substantive point of view. When 
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Table 3 | Average and conditional effects obtained from six multigroup multilevel structural equation models. 



Parameter M1 


M2 




M3 




M4 




M5 




M6 




Est. SE 


Est. 


SE 


Est. 


SE 


Est. 


SE 


Est. 


SE 


Est. 


SE 






no 13.59 0.76 


16.43 


3.36 


22.17 


9.77 


11.67 


15.82 


11.99 


11.28 


10.14 


15.75 


Y1i 


-0.12 


0.03 


-0.21 


0.12 


-0.13 


0.16 


-0.15 


0.12 


-0.14 


0.21 


Y12 


0.40 


0.50 


0.44 


2.78 


0.55 


5.11 


1.88 


3.48 


1.60 


9.27 


Y13 


-0.12 


0.03 


-0.10 


0.03 


-0.10 


0.03 


-0.10 


0.03 


-0.09 


0.03 


Y14 


0.40 


0.50 


0.36 


0.52 


0.27 


0.52 


0.44 


0.92 


0.34 


0.90 


AVE 13.59 0.76 


5.23 


0.63 


1.76 


1.00 


0.14 


1.38 


1.33 


1.12 


0.15 


1.41 


ES 1 .07 


0.43 




0.14 




0.01 




0.10 




0.01 




GROUP-SPECIFIC CONTEXTUAL EFFECTS OF COVARIATES 
















CE Z Y;X=0 






0.27 


0.04 


0.31 


0.04 


0.28 


0.03 


0.31 


0.04 


CEzy-,x=-\ 






0.16 


0.13 


0.28 


0.17 


0.23 


0.12 


0.27 


0.22 


CEvY ; x=o 






1.09 


1.09 


1.46 


2.26 


-0.79 


1.42 


0.77 


3.11 


CEvy-x=-\ 






1.17 


2.66 


1.73 


4.70 


0.64 


3.20 


2.03 


8.74 



The effect function is given by g-j (Z) = y ; o + y J; Z b + y f 2 V b + yi3Z w + yj4 V w (in models with latent variables, V is replaced by r\; in models using latent aggregation 
of covariates, a superscript of an asterisk is added to covariates). The values of the effect function are the (Z = z)-conditional effects of early transition to secondary 
school. The average treatment effect is AVE = E[gi (Z)]. ES denotes the effect size for the average effect obtained by dividing AVE by the standard deviation of Y 
given X = 0. Contextual effects are the differential effects of the within-cluster and between-cluster components of the covariate on the outcome variable. Contextual 
effects are calculated by subtracting the within-coefficient from the between-coefficient given X = x, i.e., CEzy.x=x = Px; — <*xi snd CEyY;X=x = Px2 — <*x2- In Ml 
and M2, there are no contextual effects. 



evaluating an intervention it is not only of interest if there is 
an average effect, but also for whom the intervention is ben- 
eficial or even harmful. Early transition to secondary school 
might not have a significant average effect, but could still affect 
specific students, e.g., students with high or low values of pre- 
treatment covariates. For example, high achieving students might 
particularly benefit from an early transition to secondary school- 
ing, whereas the early transition might negatively impact stu- 
dents with low initial achievement. The MG-ML-SEM approach 
can be used to study such conditional effects. In addition, 
it is also possible to study average treatment effects for cer- 
tain sub-populations. From a policy perspective, it might be 
interesting to study the effect of the early transition on the 
group of students actually choosing early transition to sec- 
ondary school (this effect has also be termed the "effect on 
the treated"). Or one might be interested in whether early 
transition to secondary school would have a beneficial effect 
for those who did not take this opportunity ("effect on the 
untreated"). 

Prior to estimating M6 for computing average and conditional 
effects, we tested for measurement invariance of "interest in read- 
ing" by comparing a model with invariance across groups and 
across levels (x 2 = 81.57, df = 14, p < 0.001; BIC = 28192.23; 
RMSEA = 0.05; CFI = 0.97; TLI = 0.98) with a model where 
invariance was not assumed (x 2 = 1.69, df = 6,p = 0.95; BIC = 
28198.01; RMSEA = 0.00; CFI = 1.00; TLI = 1.00) 4 . While the 
X 2 -test of model fit was significant for the model with invariance, 



4 Note that the fit indices differ from the ones reported in Table 4, because 
to test measurement invariance, we just specified the measurement model 



the other fit measures indicated a good fitting model, and the 
BIC was lower for the model with invariance. We concluded that 
the more parsimonious model with measurement invariance is 
adequate. 

Based on M6, we first tested the null hypothesis that all condi- 
tional effects were equal using a Wald test, i.e., Ho '• Yll = Y12 = 
y 13 = y 14 = 0, where yxi to 714 are regression coefficients in the 
effect function (cf. Table 3). Average effects on the treated and 
the untreated will differ from the average treatment effect only if 
the conditional effects are not constant. It is one of the particular 
strengths of the MG-ML-SEM approach that it allows tests of the 
treatment-covariate interactions and does not presume that con- 
ditional treatment effects are constant. Based on the Wald test, we 
rejected the null hypothesis (x 2 = 14.25, df = 4, p = 0.0065) of 
constant treatment effects. 

Next, we considered conditional treatment effects, such 
as the conditional effect of early transition to secondary 
school on the treated (CTET). The CTET is the condi- 
tional expectation of the effect function given X = 1, i.e., 
CTET=E[gi(Z*, Z*, r\\, rf w ) | X = 1]. It can be computed based 
on the parameters of the model (see Table 4): 

E[g!(Z* b , Z* w , 4, n* w )\X=l]= yio + Yll • E(Z* b I X = 1) 

+ Yn-E(4 \ X=l) 
= (P10 — Poo) + (P11 — P01 ) • H-n 
+ (P12 — P02) • M-12 

for reading interest and omitted the single-item reading comprehension 
measures. 
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Table 4 | Results obtained from the full doubly latent multigroup 
multilevel structural equation model. 







Model parameters 




X = 


0 


X 


= 1 


Within 


Between 


Within 


Between 






V1 




0.00* 




0.00* 


V 2 




0.81 (0.10) 




0.81 (0.10) 


V3 




0.09 (0.15) 




0.09 (0.15) 


X.1 


1.00* 


1.00* 


1.00* 


1.00* 


X.2 


1.07 (0.04) 


1.07 (0.04) 


1.07 (0.04) 


1 .07 (0.04) 


X 3 


1 .36 (0.06) 


1 .36 (0.06) 


1 .36 (0.06) 


1.36 (0.06) 


Var(e,) 


0.86 


0.00* 


0.76 


0.00* 


Var( E2 ) 


0.44 


0.00* 


0.22 


0.00* 


Var( E3 ) 


0.37 


0.00* 


0.12 


0.00* 




otxl 


0.52 (0.01) 




0.42 (0.02) 




<*x2 


2.31 (0.41) 




2.65 (0.79) 




PxO 




21.62 (6.47) 




31.76 (14.34) 


Pxl 




0.83 (0.04) 




0.70 (0.21) 


Px2 




3.08 (2.98) 




4.68 (8.75) 


Hx1 




97.40 (0.65) 




113.64(0.52) 


V-x2 




2.34 (0.02) 




2.67 (0.03) 


Var(t) 


69.54 


2.58 


77.44 


5.11 




X 2 (30) = 


175.60 








RMSEA = 


= 0.04 








SRMR (within) = 0.04 








SRMR (between) = 0.10 









Parameters with a superscript of an asterisk are fixed parameters, v; to are 
invariant intercepts; Xj to are invariant loadings; Var(ei ) — Varies) denote vari- 
ances of measurement error variables; a x1 -a X 2 are regression coefficients at the 
within-cluster level; fi x ;-p X 2 are regression coefficients at the between-cluster 
level; m-xJ and \l X 2 are group-specific means of covariates; Var(f ) denotes the 
variance of the regression residuals. 

using the fact that the conditional expectations of within-cluster 
components Z* and rf w given X = x is zero 5 . In our illustra- 
tive example, the CTET was not significant (CTET = —0.92, 
95% CI [-3.00, 1.16]). This implies that the early transition to 
secondary schooling was not beneficial for the group of students 
that actually underwent this transition. If this effect was causally 
interpretable (which would require controlling a larger set of 
covariates), it would imply that there were no beneficial effects of 
the supposedly enriched learning environment of the secondary 
school for the group of students that went there. These students 
would have obtained similar reading outcomes in the 6th grade 
after attending primary schools. 



5 E(Z* I X = x) = E(r\l I X = x) = 0, because Z* and r/* are denned as 
residuals with respect to the regressor C (see section about contextual effects 
and sampling error). Therefore, E[Z* | /(C)] = £[?)* | /(C)] = 0 for any 
function /(C) of C and the treatment variable X is such a function of C in 
designs with treatment application at the cluster-level considered in this paper. 



Similarly, the conditional treatment effect on the untreated 
(CTEUT) is the effect that early transition to secondary school 
would have on students who stay in elementary school until 
the end of sixth grade. The CTEUT can be computed as the 
conditional expectation of the effect function given X = 0: 

E[gi(Zt, Z* w , 4, n * w ) I X = 0] = Yio + Y11 • E(Zt I * = 0) 

+ Y12 • E(4 \X = 0) 
= (P10 — Poo) + On — P01) • M-oi 
+ (P12 — P02) • M-02 • 

In our analyses, the CTEUT is also not significant (CTEUT = 
0.74, 95% CI [-3.28, 4.76]). Again, under the assumption that 
all relevant covariates had been controlled, this would imply that 
students who stayed in primary school longer would not have 
profited from an earlier transition to secondary schooling with 
respect to their reading proficiency. 

Furthermore, we can consider conditional effects for spe- 
cific values of covariates. For example, we might be interested 
in the conditional effect for an average elementary school stu- 
dent (Z = 97.40, r] = 2.34) in an average secondary school class 
(Z* = 113.64, x]\ = 2.67), i.e., with Z* = Z - Z* = -16.24 and 
lt=1-V* b = "0.33: 

gi(z) = Y10 + Y11 • 1 13.64 + Y12 -2.67 + Y13 •(- 16.24) 
+ Yi4-(-0.33), 

which yields CTE = 0.49 (95% CI [-1.96, 2.94]). Or we might 
be interested in the conditional effect for an average secondary 
school student (Z = 113.64, r\ = 2.67) in an average elementary 
school class (Z* = 97.40, rf h = 2.34), i.e., with Z* = Z - Z* = 
16.24 and r/* = rj - rj* b = 0.33: 

gi (z) = Yio + Y11 • 97.40 + Y12 • 2.34 + y 13 • 16.24 + y u ■ 0.33, 

which yields CTE = -0.66 (95% CI [-4.95, 3.63]). Both of these 
conditional effects are not significant. 

9. DISCUSSION 

In this paper, we presented the MG-ML-SEM implementation of 
generalized ML-ANCOVA for the analysis of quasi-experimental 
multilevel designs with non-randomized assignment at the 
cluster-level. We demonstrated the flexibility of this approach 
and its potential for the analyses of average and conditional 
treatment effects using data from a German educational study 
focusing on early transitions to secondary schooling. The MG- 
ML-SEM approach overcomes the limitations of conventional 
multilevel ANCOVA by (1) accounting for measurement error, 
(2) systematically including contextual covariates with appropri- 
ate control for sampling error, (3) naturally including treatment- 
covariate interactions as a default and (4) treating predictors as 
stochastic rather than fixed. We now discuss some limitations of 
our example and focus on the conditions for interpreting average 
and conditional effect estimates obtained from generalized mul- 
tilevel ANCOVA as causal effects. 
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9.1. CAUSAL INFERENCE 

Causal effects are crucial in theories in the social sciences. Almost 
all theories include statements that can be formalized as state- 
ments about causal effects of a variable X on a variable Y. Many 
researchers have the idea that a causal treatment effect is what is 
estimated by a mean difference (between treatment and control) 
in a randomized experiment. Although it turns out that this idea 
is not wrong, it does not help much in situations in which it is not 
possible or desirable to randomly assign persons/clusters to treat- 
ments. How and under which conditions can we estimate a causal 
effect? The general conclusion is that specific techniques of causal 
modeling are indispensable, whenever we are beyond total effects 
in a randomized experiment. 

Approaches to causality include Rubin's causal model (Rubin, 
1974), Pearl's graphical approach (Pearl, 2009), the stochastic the- 
ory of causal effects (Steyer et al., 2014), or Dawid's approach 
(Dawid, 2011). These theories of causality provide definitions of 
causal effects and point out the assumptions required to estimate 
average and conditional causal effects from empirically estimable 
conditional expectations in experimental and quasi-experimental 
designs. 

A requirement of a causal analysis is a temporal structure, i.e., 
the focused cause has to be prior to the outcome, and the covari- 
ates have to be prior or simultaneous to the cause, so that the 
covariates can not be affected by the cause. This distinguishes 
covariates from potential mediators. The temporal structure of 
random variables can either be described by a directed acyclic 
graph (DAG; Pearl, 2009) or using a filtration (Steyer et al., 
2014). In the ELEMENT study, the longitudinal aspect of the 
study makes such a time order plausible: The covariates "interest 
in reading" and "pre-test reading comprehension" are measured 
at 4th grade, prior to the focused cause "early transition to 
secondary school," and the outcome "post-test reading compre- 
hension" is measured at 6th grade which is posterior to early 
transition. 

One strategy for identifying causal effects is to control for all 
confounders. In informal terms, a confounder is a covariate that 
will bias the effects of the focused cause on the outcome, if we do 
not control for it appropriately (see Steyer et al., 2014, for a def- 
inition of unbiasedness). Applied researchers are encouraged to 
test whether a covariate is a confounder or not: In Pearl's theory, 
the (conditional) independence statements implied by the DAG 
can be tested. Based on a correctly specified DAG, the researcher 
can then read off the covariates that need to be controlled in a 
causal analysis using the backdoor criterion (Pearl, 1993). In the 
stochastic theory of causal effects, there are several testable causal- 
ity conditions that imply unbiasedness (Steyer et al., 2000, 2014). 
For example, the independent cause condition is defined as con- 
ditional independence of the cause and all covariates given the 
selected covariates in the model (see CC1Z in Steyer et al, 2014). 
The regressively independent outcome condition (see CC2Z in 
Steyer et al., 2014) is defined as conditional regressive indepen- 
dence of the outcome variable from all covariates given the cause 
and the selected covariates in the model. Both of these causal- 
ity conditions imply conditional unbiasedness and are testable 
in the sense that the corresponding conditional independence 
statements can be falsified in empirical studies. 



Once we have identified all relevant confounders, i.e., once we 
have selected the covariates such that one of the causality condi- 
tions holds, we have to control for these covariates in order to esti- 
mate causal effects and not just mere associations. Based on the 
theories of causality mentioned above and work by others (e.g., 
Rosenbaum and Rubin, 1983; Robins, 1999; Shadish et al, 2002), 
several techniques have been developed to control for poten- 
tial confounders, e.g., ANCOVA adjustment, propensity scores, 
weighting techniques, matching, subclassification, marginal 
structural models and many more (see Schafer and Kang, 2008, 
for an overview). In this article, we suggested an approach to ana- 
lyze data from the ELEMENT study that extends ANCOVA tech- 
niques in several ways. Some of the other techniques could also 
be applied for analyzing data in quasi-experimental multilevel 
designs (see Baumert et al., 2009; Becker et al., 2014 for appli- 
cations of propensity score matching to the ELEMENT data). 

As mentioned previously, the primary goal of our paper was to 
illustrate the MG-ML-SEM approach as a means to estimate aver- 
age and conditional (total) effects in educational research. The 
effects obtained from models M1-M6 presented in this article 
can only be causally interpreted if the corresponding regressions 
are unbiased. For example, the average effect in Ml requires 
unbiasedness of the conditional expectations E(Y | X=x), which 
is very unlikely to hold in the ELEMENT study. It would 
only be plausible in a randomized experiment. M2 requires 
unbiasedness of the group-specific regressions E X=X (Y \ Z, V), 
and M3 requires unbiasedness of the group-specific regressions 
E X=X (Y \ Zb, Vy, Z w , V w ). Similar arguments apply to the other 
three models presented in this paper. A careful causal analysis 
would require including more potential confounders and testing 
the causality conditions. 

9.2. LIMITATIONS AND DIRECTIONS FOR FURTHER RESEARCH 

As we only compared our models on an example dataset, there 
was no way to pick the correct model from the set of models 
presented. Obviously M6, the full doubly-latent model, controls 
for shortcomings of conventional multilevel ANCOVA, but other 
models could also be defended on substantive and statistical 
grounds. For example, M5 that does not use latent aggregation 
could also be appropriate as the data in the ELEMENT study 
has been obtained from complete samples of the students within 
classrooms in Grade 4. In this case, there is only a small pro- 
portion of sampling error in the aggregation of covariates to the 
classroom level (Ltidtke et al., 2008; Marsh et al., 2012), e.g., due 
to students missing the assessment date. Similarly, M4 and M5 
might be more efficient in estimating the average treatment effect 
as the full doubly-latent model M6 has been shown to yield very 
variable estimates (Ltidtke et al., 201 1). However, as illustrated in 
the empirical examples, controlling for sampling error (and mea- 
surement error) only had minor effects on the estimated effects. 
It was the separation into within- and between-components of 
the covariates that mattered most. This separation, in turn, was 
justified by the presence of contextual effects of achievement in 
the control group. In applications, bigger differences between 
the models are expected when sampling and measurement error 
increase or when contextual effects are more pronounced. In these 
cases, the bias of the contextual effect will be relatively larger 
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in M3 compared to the full doubly-latent model (Ltidtke et al., 
2011). 

Further research on the MG-ML-SEM implementation of 
the generalized ML-ANCOVA should study more carefully the 
required sample size at both the within- and the between-level. 
Simulation studies suggest that there are circumstances where a 
sample size of at least 10 students within a class are required 
for reliably estimating contextual and average effects with M4 
(Liidtke et al., 2008; Nagengast, 2009) and large sample sizes at 
the class-level might be required for the doubly-latent model to 
be sufficiently accurate (Liidtke et al., 2011). Further simulation 
work is needed to extend these findings to the MG-ML-SEM case 
and clarify what sample sizes are required for reliable inferences 
about average and conditional effects in this framework. 

Further developments of the MG-ML-SEM framework 
will likely yield additional options for the application and 
development of generalized multilevel ANCOVA. In particular, 
accounting for cross-classified multilevel structures in multilevel 
structural equation models would be a major step forward. By 
cross-classified structure, we refer to a multilevel structure, which 
is not strictly hierarchical, but can be more complex (for details 
and two examples of a cross-classified structure, see Raudenbush, 
1993). Properly accounting for the cross-classified structure by 
considering different clustering structures for the covariates and 
the outcomes would be particularly useful for studies such as 
ELEMENT were students move from one school type to another. 
In addition, the further development of multilevel structural 
equation models with latent interactions (e.g., Leite and Zuo, 
201 1; Nagengast et al, 2013; Schermelleh-Engel et al, 2014) could 
allow to include covariate-covariate interactions and other non- 
linear effects of latent covariates in the effect functions. Similarly, 
the development of three-level multilevel structural equation 
models will yield further opportunities for extending the mod- 
els presented and allow researchers to address the complexities of 
multilevel quasi-experimental designs more comprehensively. 

In this article, we provided applied researchers with a com- 
prehensive toolbox to analyze average and conditional effects in 
non-randomized multilevel designs. We hope that this presenta- 
tion encourages researchers to apply these advanced techniques to 
address issues of measurement error, sampling error, contextual 
effects, treatment-covariate interactions and ultimately causal 
effects in the analyses of multilevel quasi-experimental designs. 

ACKNOWLEDGMENTS 

Parts of this paper are based on the dissertation of the 
author Benjamin Nagengast. We thank Rainer Lehmann and 
the Research Data Centre (Forschungsdatenzentrum, FDZ) at 
the Institute for Educational Quality Improvement (Institut 
zur Qualitatsentwicklung im Bildungswesen, IQB) for pro- 
viding the dataset. We acknowledge support by Deutsche 
Forschungsgemeinschaft and Open Access Publishing Fund of the 
University of Tubingen. 

SUPPLEMENTARY MATERIAL 

The Supplementary Material for this article can be found online 
at: http://www.frontiersin.org/journal/10.3389/fpsyg.2014.003 
04/abstract 



REFERENCES 

Aiken, L. S., and West, S. G. (1996). Multiple Regression: Testing and Interpreting 
Interactions. Thousand Oaks, CA: Sage. 

Baumert, J., Becker, M., Neumann, M., and Nikolova, R. (2009). Fruhiibergang 
in ein grundstandiges Gymnasium - Ubergang in ein privilegiertes 
Entwicklungsmilieu? [ Early transition into the academic track of sec- 
ondary schooling - transition into a privileged learning environment?]. Z. 
Erzieh. 12, 189-215. doi: 10.1007/sll618-009-0072-4 

Becker, M., Neumann, M., Tetzner, J., Bose, S., Knoppick, H., Maaz, K., et al. (2014). 
Is early ability grouping good for high -achieving students' psychosocial devel- 
opment? Effects of the transition into academically selective schools. /. Educ. 
Psychol doi: 10.1037/a0035425 

Carroll, R. J., Ruppert, D., Stefanski, L. A., and Crainiceanu, C. M. (2010). 
Measurement Error in Nonlinear Models: A Modern Perspective. Boca Raton, FL: 
Chapman and Hall/CRC. 

Chen, X. (2006). The adjustment of random baseline measurements in 
treatment effect estimation. /. Stat. Plan. Infer. 136, 4161-4175. doi: 
10.1016/j.jspi.2005.08.046 

Cook, T. D., Steiner, P. M., and Pohl, S. (2009). How bias reduction is affected 
by covariate choice, unreliability, and mode of data analysis: results from two 
types of within-study comparisons. Multivariate Behav. Res. 44, 828-847. doi: 
10.1080/00273170903333673 

Crager, M. R. (1987). Analysis of covariance in parallel-group clinical trials with 
pretreatment baselines. Biometrics 43, 895-901. doi: 10.2307/2531543 

Croon, M. A., and van Veldhoven, M. J. R M. (2007). Predicting group-level out- 
come variables from variables measured at the individual level: a latent variable 
multilevel model. Psychol Methods 12, 45-57. doi: 10.1037/1082-989X.12.1.45 

Culpepper, S. A., and Aguinis, H. (201 1). Using analysis of covariance (ANCOVA) 
with fallible covariates. Psychol. Methods 16, 166-178. doi: 10.1037/a0023355 

Dawid, A. P. (2011). Fundamentals of Statistical Causality. Research report, 
University of Sheffield. 

Degrade, J. S., and Fuller, W. A. (1972). Estimation of the slope and analysis of 
covariance when the concomitant variable is measured with error. /. Am. Stat. 
Assoc. 67, 930-937. doi: 10.1080/01621459.1972.10481321 

Enders, C. K., and Tofighi, D. (2007). Centering predictor variables in cross- 
sectional multilevel models: a new look at an old issue. Psychol. Methods 12, 
121-138. doi: 10.1037/1082-989X.12.2.121 

Gatsonis, C, and Sampson, A. R. (1989). Multiple correlation: exact power 
and sample size calculations. Psychol. Bull 106, 516-524. doi: 10.1037/0033- 
2909.106.3.516 

Gitelman, A. I. (2005). Estimating causal effects from multilevel group -alio cation 
data./. Educ. Behav. Stat. 30, 397-412. doi: 10.3102/10769986030004397 

Grilli, L., and Rampichini, C. (2011). The role of sample cluster means in multilevel 
models: a view on endogeneity and measurement error issues. Methodology 7, 
121-133. doi: 10.1027/1614-2241/a000030 

Harker, R., and Tymms, P. (2004). The effects of student composi- 
tion on school outcomes. School Eff. School Improv. 15, 177-199. doi: 
10.1076/sesi.l5.2. 177.30432 

Hong, G., and Raudenbush, S. W. (2006). Evaluating kindergarten retention policy: 
a case study of causal inferrence for multilevel observational data. /. Am. Stat. 
Assoc. 101,901-910. doi: 10.1198/016214506000000447 

Hong, G., and Raudenbush, S. W. (2008). Causal inference for time- 
varying instructional treatments. /. Educ. Behav. Stat. 33, 333-362. doi: 
10.3102/1076998607307355 

Hutchison, D. (2007). When is a compositional effect not a compositional effect? 
Qual. Quan. 41,219-232. doi: 10.1007/slll35-007-9094-2 

Jak, S., Oort, F. J., and Dolan, C. V. (2013). A test for cluster bias: detecting vio- 
lations of measurement invariance across clusters in multilevel data. Struct. Eq. 
Model. Multidisc. J. 20, 265-282. doi: 10.1080/10705511.2013.769392 

Kreft, I. G., de Leeuw, J., and Aiken, L. S. (1995). The effect of different forms of 
centering in hierarchical linear models. Multivariate Behav. Res. 30, 1-21. doi: 
10.1207/sl5327906mbr3001_l 

Krohne, U. (2009). Estimation of Average Causal Effects in Quasi-Experimental 
Designs: Non-Linear Constraints in Structural Equation Models. Ph.D. thesis, 
Friedrich-Schiller-University Jena, Germany. 

Lehmann, R. (2010). "Langeres gemeinsames Lernen. Erschwert oder erleichtert 
es den Ubergang in weiterfuhrende Schularten? in Ubergdnge im Schulwesen, 
eds S. Lin-Klitzing, D.-S. Di Fuccia, and G. Muller-Frerich (Bad Heilbrunn: 
Klinkhardt), 127-139. 



Frontiers in Psychology | Quantitative Psychology and Measurement 



April 2014 | Volume 5 | Article 304 | 14 



Mayer et al. 



Analyzing effects with multigroup multilevel SEM 



Lehmann, R., Gansfufi, R., and Peek, R. (1999). Aspekte der Lernausgangslage und 
der Lernentwicklung von Schulerinnen und Schulern an Hamburger Schulen - 
Klassenstufe 7. Hamburg: Behdrde fur Schule, Jugend und Berufsbildung, Amt 
fur Schule. 

Lehmann, R. H., and Lenkeit, J. (2008). Element. Erhebung zum Lese- und 
Mathematikverstandnis. Entwicklungen in den Jahrgangsstufen 4 bis 6 in Berlin: 
Abschlussbericht uber die Untersuehungen 2003, 2004 und 2005 an Berliner 
Grundschulen und grundstdndigen Gymnasien. Berlin: Humboldt Universitat zu 
Berlin. 

Lehmann, R. H., Peek, R., and Gansfufi, R. (1997). Aspekte der Lernausgangslage von 
Schulerinnen und Schulern der fiinften Klassen an Hamburger Schulen. Bericht 
uber die Untersuchung im September 1996. Hamburg: Behorde fur Schule, 
Jugend und Berufsbildung, Amt fur Schule. 

Leite, W. L., and Zuo, Y. (2011). Modeling latent interactions at level 2 in multilevel 
structural equation models: an evaluation of mean-centered and residual- 
centered unconstrained approaches. Struct. Eq. Model. Multidisc. J. 18, 449-464. 
doi: 1 0. 1080/107055 1 1 .20 1 1 .582400 

Lockwood, J. R., and McCaffrey, D. F. (2014). Correcting for test score measure- 
ment error in ANCOVA models for estimating treatment effects. /. Educ. Behav. 
Stat. 39, 22-52. doi: 10.3102/1076998613509405 

Liidtke, O., Marsh, H. W., Robitzsch, A., and Trautwein, U. (2011). A 2CE2 
taxonomy of multilevel latent contextual models: accuracy-bias trade-offs in 
full and partial error correction models. Psychol. Methods 16, 444-467. doi: 
10.1037/a0024376 

Liidtke, O., Marsh, H. W., Robitzsch, A., Trautwein, U., Asparouhov, T., and 
Muthen, B. O. (2008). The multilevel latent covariate model: a new, more reli- 
able approach to group-level effects in contextual studies. Psychol. Methods 13, 
203-229. doi: 10.1037/a0012869 

Marsh, H. W., Liidtke, O., Nagengast, B., Trautwein, U., Morin, A. J. S., Abduljabbar, 
A. S., et al. (2012). Classroom climate and contextual effects: conceptual and 
methodological issues in the evaluation of group-level effects. Educ. Psychol. 47, 
106-124. doi: 10.1080/00461520.2012.670488 

Marsh, H. W., Liidtke, O., Robitzsch, A., Trautwein, U., Asparouhov, T., 
Muthen, B., et al. (2009). Doubly-latent models of school contextual effects: 
integrating multilevel and structural equation approaches to control mea- 
surement and sampling error. Multivariate Behav. Res. 44, 764-802. doi: 
10. 1080/00273 170903333665 

Mehta, P. D., and Neale, M. C. (2005). People are variables too: mul- 
tilevel structural equation modeling. Psychol. Methods 10, 259-284. doi: 
10. 1080/00273 170903333665 

Meredith, W. (1993). Measurement invariance, factor analysis and factorial invari- 
ance. Psychometrika 58, 525-543. doi: 10.1007/BF02294825 

Mullis, I. V. S., Kennedy, A. M., Martin, M. O., and Sainsbury, M. (2006). PIRLS 
2006 Assessment Framework and Specifications. Boston, MA: TIMSS 8c PIRLS 
International Study Center, Lynch School of Education, Boston College. 

Murray, D. M. (1998). Design and Analysis of Group-Randomized Trials. New York, 
NY: Oxford University Press. 

Murray, D. M. (2001). Statistical models appropriate for designs often used in 
group-randomized trials. Stat. Med. 20, 1373-1385. doi: 10.1002/sim.675 

Muthen, B. O. (1989). Latent variable modeling in heterogeneous populations. 
Psychometrika 54, 557-585. doi: 10.1007/BF02296397 

Muthen, B. O. (1994). Multilevel covariance structure analysis. Sociol. Methods Res. 
22, 378-398. 

Muthen, B. O. (1998-2004). Mplus Technical Appendices. Los Angeles, CA: Muthen 
& Muthen. 

Muthen, B. O. (2002). Beyond SEM: general latent variable modeling. 

Behaviormetrika 29, 81-117. doi: 10.2333/bhmk.29.81 
Muthen, B. O., Khoo, S.-T, and Gustafsson, I. E. (1997). Multilevel Latent 

Variable Modeling in Multiple Populations. Technical report, Graduate School 

of Education 8c Information Studies, University of California, Los Angeles. 
Muthen, L. K., and Muthen, B. O. (1998-2012). Mplus User's Guide, 7th Edn. Los 

Angeles, CA: Muthen 8c Muthen. 
Nagengast, B. (2006). Standard Errors of ACE Estimates: Comparing Adjusted Group 

Means Against the Adjusted Grand Mean. A Simulation Study. Unpublished 

Diploma thesis, University of Jena, Jena, Thuringia, Germany. 
Nagengast, B. (2009). Causal Inference in Multilevel Models. Ph.D. thesis, Friedrich- 

Schiller-Universitat Jena, Thiiringen, Germany. 
Nagengast, B., Trautwein, U., Kelava, A., and Liidtke, O. (2013). Synergistic 

effects of expectancy and value on homework engagement: the case for 



a within-person perspective. Multivariate Behav. Res. 48, 428-460. doi: 

10. 1080/00273 1 7 1 .20 1 3.775060 
Pearl, J. (1993). "Aspects of graphical models connected with causality," in 

Proceedings of the 49th Session of the International Statistical Institute. Tome LV, 

Bookl (Florence, Italy). 
Pearl, J. (2009). Causality: Models, Reasoning, and Inference, 2nd Edn. Cambridge, 

UK: Cambridge University Press, doi: 10.1017/CBO9780511803161 
Pinheiro, J. C, and Bates, D. M. (2000). Mixed-Effects Models in S and S-PLUS. New 

York, NY: Springer, doi: 10.1007/978-1-4419-0318-1 
Pituch, K. A. (2001). Using multilevel modeling in large-scale planned variation 

educational experiments: improving understanding of intervention effects. /. 

Exp. Educ. 69, 347-373. doi: 10.1080/00220970109599492 
Plewis, I., and Hurry, J. (1998). A multilevel perspective on the design and anal- 
ysis of intervention studies. Educ. Res. Eval. 4, 13-26. doi: 10.1076/edre.4.1.13. 

13014 

Rabe-Hesketh, S., Skrondal, A., and Pickles, A. (2004). Generalized mul- 
tilevel structural equation modeling. Psychometrika 69, 167-190. doi: 

10.1007/BF02295939 
Rabe-Hesketh, S., Skrondal, A., and Zheng, X. (2012). "Generalized multilevel 

structural equation modeling," in Handbook of Structural Equation Modeling, 

ed R. Hoyle (New York, NY: Guilford Press), 512-531. 
Raudenbush, S. W. (1993). A crossed random effects model for unbalanced data 

with applications in cross-sectional and longitudinal research. /. Educ. Behav. 

Stat. 18, 321-349. doi: 10.3102/10769986018004321 
Raudenbush, S. W., and Bryk, A. S., (eds.). (2002). Hierarchical Linear Models: 

Applications and Data Analysis Methods, 2nd Edn. Thousand Oaks, CA: Sage. 
Raudenbush, S. W., and Willms, J. D. (1995). The estimation of school effects. /. 

Educ. Behav. Stat. 20, 307-335. doi: 10.3102/10769986020004307 
Robins, J. M. (1999). Association, causation, and marginal structural models. 

Synthese 121, 151-179. doi: 10.1023/A:1005285815569 
Rogosa, D. (1980). Comparing nonparallel regression lines. Psychol. Bull. 88, 

307-321. doi: 10.1037/0033-2909.88.2.307 
Rosenbaum, P. R. (2002). Covariance adjustment in randomized experiments and 

observational studies. Stat. Sci. 17, 286-304. doi: 10.1214/ss/1042727942 
Rosenbaum, P. R., and Rubin, D. B. (1983). The central role of the propensity 

score in observational studies for causal effects. Biometrika 70, 41-55. doi: 

10.1093/biomet/70.1.41 
Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and 

nonrandomized studies. /. Edu. Psychol. 66, 688-701. doi: 10.1037/h0037350 
Sampson, A. R. (1974). A tale of two regressions. /. Am. Stat. Assoc. 69, 682-689. 

doi: 10.1080/01621459.1974.10480189 
Schafer, J. L., and Kang, J. D. Y. (2008). Average causal effects from observational 

studies: a practical guide and simulated example. Psychol. Methods 13, 279-313. 

doi: 10.1080/01621459.1974.10480189 
Schermelleh-Engel, K., Kerwer, M., and Klein, A. G. (2014). Evaluation of model fit 

in nonlinear multilevel structural equation modeling. Front. Psychol. 5:181. doi: 

10.3389/fpsyg.2014.00181 
Schweig, J. (2013). Cross-level measurement invariance in school and classroom 

environment surveys. Educ. Eval. Policy Anal, doi: 10.3102/0162373713509880. 

[Epub ahead of print] . 
Seltzer, M. (2004). "The use of hierarchical models in analyzing data from 

experiments and quasi-experiments conducted in field settings," in The SAGE 

Handbook of Quantitative Methodology for the Social Sciences, ed D. Kaplan 

(Thousand Oaks, CA: Sage). 
Senn, S-, Graf, E., and Caputo, A. (2007). Stratification for the propensity score 

compared with linear regression techniques to assess the effect of treatment or 

exposure. Stat. Med. 26, 5529-5544. doi: 10.1002/sim.3133 
Shadish, W. R., Cook, T. D., and Campbell, D. T. (2002). Experimental and Quasi- 
Experimental Designs for Generalized Causal Inference. Boston, MA: Houghton 

Mifflin. 

Shieh, G. (2006). Exact interval estimation, power calculation, and sample size 

determination in normal correlational analysis. Psychometrika 71, 529-540. doi: 

10.1007/sll336-04-1221-6 
Shin, Y., and Raudenbush, S. W. (2010). A latent cluster-mean approach to the 

contextual effects model with missing data. /. Educ. Behav. Stat. 35, 26-53. doi: 

10.3102/1076998609345252 
Skrondal, A., and Rabe-Hesketh, S. (2004). Generalized Latent Variable Modeling: 

Multilevel, Longitudinal and Structural Equation Models. Boca Raton, FL: 

Chapman 8c Hall, doi: 10.1201/9780203489437 



www.frontiersin.org 



April 2014 | Volume 5 | Article 304 | 15 



Mayer et al. 



Analyzing effects with multigroup multilevel SEM 



Snijders, T. A., and Bosker, R. J. (2012). Multilevel Analysis: An Introduction to Basic 
and Advanced Multilevel Modeling. London: Sage. 

Sobel, M. E. (2006). What do randomized studies of housing mobility demon- 
strate?: causal inference in the face of interference. /. Am. Stat. Assoc. 101, 
1398-1407. doi: 10.1198/016214506000000636 

Steiger, J. H., and Fouladi, R. T. (1992). R2: a computer program for interval esti- 
mation, power calculations, sample size estimation, and hypothesis testing in 
multiple regression. Behav. Res. Methods lnstrum. Comput. 24, 580-582. doi: 
10.3758/BF03203611 

Steyer, R., Gabler, S., von Davier, A. A., and Nachtigall, C. (2000). Causal 
regression models II: unconfoundedness and causal unbiasedness. Methods 
Psychol. Res. Online 5, 55-87. Available online at: http://www.dgps.de/ 
fachgruppen/methoden/mpr-online/issuell/art4/steyerCRII.pdf 

Steyer, R., Mayer, A., and Fiege, C. (2014). "Causal inference on total, direct, and 
indirect effects," in Encyclopedia of Quality of Life and Well-Being Research, ed A. 
C. Michalos (Dodrecht, Netherlands: Springer), 606-631. doi: 10.1007/978-94- 
007-0753-5_295 

Steyer, R., and Partchev, I. (2008). EffectLite for Mplus: a Program for the Uni- 
and Multivariate Analysis of Unconditional, Conditional and Average Mean 
Differences Between Groups. [ Computer Software and Manual]. Available online 
at: www.statlite.com (Retrieved May 5, 2008). 

Ukoumunne, C. C, Gulliford, M. C, Chinn, S., Sterne, J. A. C, and Burney, P. G. J. 
(1999). Methods for evaluating area-wide and organisation based interventions 



in health and health care: a systematic review. Health Technol. Assess. 
3, 1-99. 

VanderWeele, T. J. (2008). Ignorability and stability assumptions in neigh- 
borhood effects research. Stat. Med. 27, 1934-1943. doi: 10.1002/ 
sim.3139 

Conflict of Interest Statement: The authors declare that the research was con- 
ducted in the absence of any commercial or financial relationships that could be 
construed as a potential conflict of interest. 

Received: 15 November 2013; paper pending published: 24 January 2014; accepted: 24 
March 2014; published online: 23 April 2014. 

Citation: Mayer A, Nagengast B, Fletcher J and Steyer R (2014) Analyzing average 
and conditional effects with multigroup multilevel structural equation models. Front. 
Psychol. 5:304. doi: 10.3389/fpsyg.2014.00304 

This article was submitted to Quantitative Psychology and Measurement, a section of 
the journal Frontiers in Psychology. 

Copyright © 2014 Mayer, Nagengast, Fletcher and Steyer. This is an open-access 
article distributed under the terms of the Creative Commons Attribution License 
(CC BY). The use, distribution or reproduction in other forums is permitted, provided 
the original author(s) or licensor are credited and that the original publication in this 
journal is cited, in accordance with accepted academic practice. No use, distribution or 
reproduction is permitted which does not comply with these terms. 



Frontiers in Psychology | Quantitative Psychology and Measurement April 2014 | Volume 5 | Article 304 | 16 



